PostgreSQL Page页结构解析 B-Tree索引的页头和行数据指针

本文简单介绍了PG索引数据页Page中存储的原始内容以及如何阅读它们,包括页头PageHeader和行数据指针ItemId(Line Pointer)。

测试数据准备

1 创建一张表,插入数据并创建索引

   
 -- 创建一张表,插入几行数据
drop table if exists t_index;
create table t_index (id int,c1 char(8),c2 varchar(16));
insert into t_index values(2,'1','a');
insert into t_index values(4,'2','b');
insert into t_index values(8,'3','c');
insert into t_index values(16,'4','d');

 -- 创建索引
 alter table t_index add constraint pk_t_index primary key(id);

2 获取该索引对应的数据文件

testdb=#  SELECT pg_relation_filepath('pk_t_index');

 pg_relation_filepath 
----------------------
 base/13758/16441

3 Dump数据文件中的数据

[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441
00000000  00 00 00 00 08 9f 79 01  00 00 00 00 48 00 f0 1f  |......y.....H...|
00000010  f0 1f 04 20 00 00 00 00  62 31 05 00 04 00 00 00  |... ....b1......|
00000020  01 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 f0 bf  |................|
00000040  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001ff0  00 00 00 00 00 00 00 00  00 00 00 00 08 00 00 00  |................|
00002000  00 00 00 00 78 9e 79 01  00 00 00 00 28 00 b0 1f  |....x.y.....(...|
00002010  f0 1f 04 20 00 00 00 00  e0 9f 20 00 d0 9f 20 00  |... ...... ... .|
00002020  c0 9f 20 00 b0 9f 20 00  b0 9f 20 00 00 00 00 00  |.. ... ... .....|
00002030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003fb0  00 00 00 00 04 00 10 00  10 00 00 00 00 00 00 00  |................|
00003fc0  00 00 00 00 03 00 10 00  08 00 00 00 00 00 00 00  |................|
00003fd0  00 00 00 00 02 00 10 00  04 00 00 00 00 00 00 00  |................|
00003fe0  00 00 00 00 01 00 10 00  02 00 00 00 00 00 00 00  |................|
00003ff0  00 00 00 00 00 00 00 00  00 00 00 00 03 00 00 00  |................|
00004000

PageHeader

上一节提到过PageHeaderData,其数据结构如下:

typedef struct PageHeaderData

{

/* XXX LSN is member of *any* block, not only page-organized ones */

PageXLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog

* record for last change to this page */

uint16 pd_checksum; /* checksum */

uint16 pd_flags; /* flag bits, see below */

LocationIndex pd_lower; /* offset to start of free space */

LocationIndex pd_upper; /* offset to end of free space */

LocationIndex pd_special; /* offset to start of special space */

uint16 pd_pagesize_version;

TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */

ItemIdData pd_linp\[1\]; /* beginning of line pointer array */

}  PageHeaderData;

下面根据数据文件中的数据使用hexdump查看并逐个进行解析。

1 pd_lsn(8bytes)

[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 0 -n 8
00000000  00 00 00 00 08 9f 79 01                           |......y.|
00000008

数据文件的8个Bytes存储的是LSN,其中最开始的4个Bytes是TimelineID,在这里是\x0000 0000(即数字0),后面的4个Bytes是\x01799f08 ,组合起来LSN为0/01799f08

Warning

A、0000000&0000008是hexdump工具的输出,不是数据内容

B、X86使用小端模式,阅读字节码时注意高低位变换

2 pd_checksum(2bytes)

[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 8 -n 2

00000008  00 00                                            |..|

0000000a

checksum为\x0000

3 pd_flags(2bytes)

\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 10 -n 2

0000000a  00 00                                            |..|

0000000c

flags为\\x0000

pd_flags 是标志位,初始化未定义的位,默认为零,并可能在将来使用。

  1. 此页面有空闲空间,且 行指针 位置小于pd_lower 则考虑重用.
  2. 此页面没有空闲空间时,UPDATE无法容纳新版本数据,则设置此位, 标志此页面出现行迁移.

4 pg_lower

[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441 -s 12 -n 2

0000000c  48 00                                            |(.|

0000000e
[xdb@localhost utf8db]$ echo $((0x0048))

lower为\\x0048,十进制值为72

5 pd_upper(2bytes)

\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 14 -n 2

0000000e  f0 1f                                            |\`.|

00000010

\[xdb@localhost utf8db\]$ echo $((0x1ff0))

8176

upper为\\x1f60,十进制为8032

6 pd_special(2bytes)

\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 16 -n 2

00000010  f0 1f                                             |..|
00000012
\[xdb@localhost utf8db\]$ echo $((0x1ff0))
8176
Special Space为\x1ff0,十进制值为8176

7 pd_pagesize_version(2bytes)

\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 18 -n 2

00000012  04 20                                            |. |

00000014

pagesize_version为\\x2004,十进制为8196(即版本4)

8 pd_prune_xid(4bytes)

\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 20 -n 4

00000014  00 00 00 00                                      |....|

00000018

prune_xid为\\x0000,即0

ItemIds

PageHeaderData之后是ItemId数组,每个元素占用的空间为4Bytes,数据结构:

typedef struct ItemIdData

{
unsigned lp_off:15,/* offset to tuple (from start of page) */

         lp_flags:2,/* state of item pointer, see below */

         lp_len:15;/* byte length of tuple */

} ItemIdData;

typedef ItemIdData* ItemId;

1 lp_off

Warning

偏移量有问题,暂不知道问题所在.

\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 24 -n 2

00000018  62 31                                             |b1|
0000001a

取低15位

\[xdb@localhost utf8db\]$ echo $((0x3162 & ~$((1<<15))))

12641
12642 + 16 

表示第1个Item(tuple)从12642开始

2 lp_len

[postgres@node1 ~]$ hexdump -C $PGDATA/base/13758/16441 -s 28 -n 2
0000001c  04 00                                             |..|
0000001e

取高15位

\[xdb@localhost utf8db\]$ echo $((0x0004))

4

表示第1个Item(tuple)的大小为4个字节

3 lp_flags

0x9fd8=1001111111011000
004e= 1001110000000000
取第17-16位,01,即1
当lp_flags 为1时,指定 一个未使用的行指针,是否重用.
当lp_flags 为其他值时,则不会立即重用.

page
16384(bytes)
page...
page
16384(bytes)
page...
page
16384(bytes)
page...
block number
block number
0th
0th
1th
1th
N-th
N-th
heap tuples
(record data)
heap tuples...
index file
index file
pd_lsn
pd_lsn
pd_checmsum
pd_checmsum
pd_flags
pd_flags
pd_lower
pd_lower
pd_upper
pd_upper
pd_special
pd_special
pd_pagesize_
version
pd_pagesize_...
pg_prune_xid
pg_prune_xid
1
1
2
2
tuple 1
tuple 1
line pointers
line pointers
free space
(hole)
free space...
pd_lower
pd_lower
pd_upper
pd_upper
lp_len
lp_len
lp_flags
lp_flags
lp_off
lp_off
2bits
2bits
15 bits
15 bits
15bits
15bits
t_tid
t_tid
t_info
t_info
ip_blkid
ip_blkid
ip_posid
ip_posid
4bytes
4bytes
2bytes
2bytes
2bytes
2bytes
8bytes
8bytes
values
values
4bytes
4bytes
IndexTupleData
IndexTupleData
Bitmap
Bitmap
4bytes
4bytes
8bytes
8bytes
16bytes
16bytes
Special Space
Special Space
2bytes
2bytes
btpo_prev
btpo_prev
btpo_next
btpo_next
btpo_level
btpo_level
btpo_flags
btpo_flags
btpo_cycleid
btpo_cycle...
4bytes
4bytes
4bytes
4bytes
4bytes
4bytes
2bytes
2bytes
2bytes
2bytes
16bytes
16bytes
24bytes
24bytes
8bytes
8bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
4bytes
4bytes
0
0
Viewer does not support full SVG 1.1

小结

以上简单介绍了如何阅读Raw Datafile,包括页头和数据行指针信息,有兴趣的同学可在此基础上实现自己的“pageinspect"。下一节将介绍数据行信息。